Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 132.9 KiB |
| Average record size in memory | 136.1 B |
Variable types
| Categorical | 9 |
|---|---|
| Numeric | 7 |
| DateTime | 1 |
Porcentagem margem bruta has constant value "4.761904762" | Constant |
ID_fatura has a high cardinality: 1000 distinct values | High cardinality |
Horario has a high cardinality: 506 distinct values | High cardinality |
Preco unitario is highly correlated with Imposto and 3 other fields | High correlation |
Quantidade is highly correlated with Imposto and 3 other fields | High correlation |
Imposto is highly correlated with Preco unitario and 4 other fields | High correlation |
Total is highly correlated with Preco unitario and 4 other fields | High correlation |
Custo mercadoria is highly correlated with Preco unitario and 4 other fields | High correlation |
Renda bruta is highly correlated with Preco unitario and 4 other fields | High correlation |
Nota da experiência de compra is highly correlated with Porcentagem margem bruta | High correlation |
Filial is highly correlated with Cidade | High correlation |
Cidade is highly correlated with Filial | High correlation |
Tipo de cliente is highly correlated with Porcentagem margem bruta | High correlation |
Sexo is highly correlated with Porcentagem margem bruta | High correlation |
Linha de produtos is highly correlated with Porcentagem margem bruta | High correlation |
Pagamento is highly correlated with Porcentagem margem bruta | High correlation |
Porcentagem margem bruta is highly correlated with Cidade and 5 other fields | High correlation |
ID_fatura is uniformly distributed | Uniform |
Horario is uniformly distributed | Uniform |
ID_fatura has unique values | Unique |
Reproduction
| Analysis started | 2023-06-27 11:27:44.567770 |
|---|---|
| Analysis finished | 2023-06-27 11:28:11.038819 |
| Duration | 26.47 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 1000 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 750-67-8428 | 1 |
|---|---|
| 642-61-4706 | 1 |
| 816-72-8853 | 1 |
| 491-38-3499 | 1 |
| 322-02-2271 | 1 |
| Other values (995) |
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 11 |
| Min length | 11 |
Characters and Unicode
| Total characters | 11000 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1000 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | 750-67-8428 |
|---|---|
| 2nd row | 226-31-3081 |
| 3rd row | 631-41-3108 |
| 4th row | 123-19-1176 |
| 5th row | 373-73-7910 |
Common Values
| Value | Count | Frequency (%) |
| 750-67-8428 | 1 | 0.1% |
| 642-61-4706 | 1 | 0.1% |
| 816-72-8853 | 1 | 0.1% |
| 491-38-3499 | 1 | 0.1% |
| 322-02-2271 | 1 | 0.1% |
| 842-29-4695 | 1 | 0.1% |
| 725-67-2480 | 1 | 0.1% |
| 641-51-2661 | 1 | 0.1% |
| 714-02-3114 | 1 | 0.1% |
| 518-17-2983 | 1 | 0.1% |
| Other values (990) | 990 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 750-67-8428 | 1 | 0.1% |
| 252-56-2699 | 1 | 0.1% |
| 871-79-8483 | 1 | 0.1% |
| 848-62-7243 | 1 | 0.1% |
| 631-41-3108 | 1 | 0.1% |
| 123-19-1176 | 1 | 0.1% |
| 373-73-7910 | 1 | 0.1% |
| 699-14-3026 | 1 | 0.1% |
| 355-53-5943 | 1 | 0.1% |
| 315-22-5665 | 1 | 0.1% |
| Other values (990) | 990 |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 2000 | |
| 2 | 957 | |
| 6 | 954 | |
| 1 | 950 | |
| 8 | 944 | |
| 5 | 927 | |
| 4 | 918 | |
| 3 | 909 | |
| 7 | 895 | |
| 0 | 809 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 9000 | |
| Dash Punctuation | 2000 | 18.2% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 957 | |
| 6 | 954 | |
| 1 | 950 | |
| 8 | 944 | |
| 5 | 927 | |
| 4 | 918 | |
| 3 | 909 | |
| 7 | 895 | |
| 0 | 809 | |
| 9 | 737 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 2000 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 11000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| - | 2000 | |
| 2 | 957 | |
| 6 | 954 | |
| 1 | 950 | |
| 8 | 944 | |
| 5 | 927 | |
| 4 | 918 | |
| 3 | 909 | |
| 7 | 895 | |
| 0 | 809 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| - | 2000 | |
| 2 | 957 | |
| 6 | 954 | |
| 1 | 950 | |
| 8 | 944 | |
| 5 | 927 | |
| 4 | 918 | |
| 3 | 909 | |
| 7 | 895 | |
| 0 | 809 |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| A | |
|---|---|
| B | |
| C |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 1000 |
|---|---|
| Distinct characters | 3 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | A |
|---|---|
| 2nd row | C |
| 3rd row | A |
| 4th row | A |
| 5th row | A |
Common Values
| Value | Count | Frequency (%) |
| A | 340 | |
| B | 332 | |
| C | 328 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| a | 340 | |
| b | 332 | |
| c | 328 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 340 | |
| B | 332 | |
| C | 328 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 1000 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 340 | |
| B | 332 | |
| C | 328 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1000 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| A | 340 | |
| B | 332 | |
| C | 328 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| A | 340 | |
| B | 332 | |
| C | 328 |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| Yangon | |
|---|---|
| Mandalay | |
| Naypyitaw |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.648 |
| Min length | 6 |
Characters and Unicode
| Total characters | 7648 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Yangon |
|---|---|
| 2nd row | Naypyitaw |
| 3rd row | Yangon |
| 4th row | Yangon |
| 5th row | Yangon |
Common Values
| Value | Count | Frequency (%) |
| Yangon | 340 | |
| Mandalay | 332 | |
| Naypyitaw | 328 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| yangon | 340 | |
| mandalay | 332 | |
| naypyitaw | 328 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 1992 | |
| n | 1012 | |
| y | 988 | |
| Y | 340 | 4.4% |
| g | 340 | 4.4% |
| o | 340 | 4.4% |
| M | 332 | 4.3% |
| d | 332 | 4.3% |
| l | 332 | 4.3% |
| N | 328 | 4.3% |
| Other values (4) | 1312 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 6648 | |
| Uppercase Letter | 1000 | 13.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1992 | |
| n | 1012 | |
| y | 988 | |
| g | 340 | 5.1% |
| o | 340 | 5.1% |
| d | 332 | 5.0% |
| l | 332 | 5.0% |
| p | 328 | 4.9% |
| i | 328 | 4.9% |
| t | 328 | 4.9% |
Uppercase Letter
| Value | Count | Frequency (%) |
| Y | 340 | |
| M | 332 | |
| N | 328 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 7648 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1992 | |
| n | 1012 | |
| y | 988 | |
| Y | 340 | 4.4% |
| g | 340 | 4.4% |
| o | 340 | 4.4% |
| M | 332 | 4.3% |
| d | 332 | 4.3% |
| l | 332 | 4.3% |
| N | 328 | 4.3% |
| Other values (4) | 1312 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7648 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 1992 | |
| n | 1012 | |
| y | 988 | |
| Y | 340 | 4.4% |
| g | 340 | 4.4% |
| o | 340 | 4.4% |
| M | 332 | 4.3% |
| d | 332 | 4.3% |
| l | 332 | 4.3% |
| N | 328 | 4.3% |
| Other values (4) | 1312 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| Member | |
|---|---|
| Normal |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 6 |
| Min length | 6 |
Characters and Unicode
| Total characters | 6000 |
|---|---|
| Distinct characters | 9 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Member |
|---|---|
| 2nd row | Normal |
| 3rd row | Normal |
| 4th row | Member |
| 5th row | Normal |
Common Values
| Value | Count | Frequency (%) |
| Member | 501 | |
| Normal | 499 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| member | 501 | |
| normal | 499 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1002 | |
| m | 1000 | |
| r | 1000 | |
| M | 501 | |
| b | 501 | |
| N | 499 | |
| o | 499 | |
| a | 499 | |
| l | 499 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5000 | |
| Uppercase Letter | 1000 | 16.7% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1002 | |
| m | 1000 | |
| r | 1000 | |
| b | 501 | |
| o | 499 | |
| a | 499 | |
| l | 499 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 501 | |
| N | 499 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6000 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1002 | |
| m | 1000 | |
| r | 1000 | |
| M | 501 | |
| b | 501 | |
| N | 499 | |
| o | 499 | |
| a | 499 | |
| l | 499 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 6000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1002 | |
| m | 1000 | |
| r | 1000 | |
| M | 501 | |
| b | 501 | |
| N | 499 | |
| o | 499 | |
| a | 499 | |
| l | 499 |
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| Female | |
|---|---|
| Male |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.002 |
| Min length | 4 |
Characters and Unicode
| Total characters | 5002 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Female |
|---|---|
| 2nd row | Female |
| 3rd row | Male |
| 4th row | Male |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Female | 501 | |
| Male | 499 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| female | 501 | |
| male | 499 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 1501 | |
| a | 1000 | |
| l | 1000 | |
| F | 501 | 10.0% |
| m | 501 | 10.0% |
| M | 499 | 10.0% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 4002 | |
| Uppercase Letter | 1000 | 20.0% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 1501 | |
| a | 1000 | |
| l | 1000 | |
| m | 501 | 12.5% |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 501 | |
| M | 499 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 5002 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 1501 | |
| a | 1000 | |
| l | 1000 | |
| F | 501 | 10.0% |
| m | 501 | 10.0% |
| M | 499 | 10.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5002 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 1501 | |
| a | 1000 | |
| l | 1000 | |
| F | 501 | 10.0% |
| m | 501 | 10.0% |
| M | 499 | 10.0% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| Fashion accessories | |
|---|---|
| Food and beverages | |
| Electronic accessories | |
| Sports and travel | |
| Home and lifestyle |
Length
| Max length | 22 |
|---|---|
| Median length | 19 |
| Mean length | 18.54 |
| Min length | 17 |
Characters and Unicode
| Total characters | 18540 |
|---|---|
| Distinct characters | 25 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Health and beauty |
|---|---|
| 2nd row | Electronic accessories |
| 3rd row | Home and lifestyle |
| 4th row | Health and beauty |
| 5th row | Sports and travel |
Common Values
| Value | Count | Frequency (%) |
| Fashion accessories | 178 | |
| Food and beverages | 174 | |
| Electronic accessories | 170 | |
| Sports and travel | 166 | |
| Home and lifestyle | 160 | |
| Health and beauty | 152 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| and | 652 | |
| accessories | 348 | |
| fashion | 178 | 6.7% |
| food | 174 | 6.6% |
| beverages | 174 | 6.6% |
| electronic | 170 | 6.4% |
| sports | 166 | 6.3% |
| travel | 166 | 6.3% |
| home | 160 | 6.0% |
| lifestyle | 160 | 6.0% |
| Other values (2) | 304 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2338 | |
| a | 1822 | 9.8% |
| s | 1722 | 9.3% |
| 1652 | 8.9% | |
| o | 1370 | 7.4% |
| c | 1036 | 5.6% |
| r | 1024 | 5.5% |
| n | 1000 | 5.4% |
| t | 966 | 5.2% |
| i | 856 | 4.6% |
| Other values (15) | 4754 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 15888 | |
| Space Separator | 1652 | 8.9% |
| Uppercase Letter | 1000 | 5.4% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 2338 | |
| a | 1822 | |
| s | 1722 | |
| o | 1370 | |
| c | 1036 | 6.5% |
| r | 1024 | 6.4% |
| n | 1000 | 6.3% |
| t | 966 | 6.1% |
| i | 856 | 5.4% |
| d | 826 | 5.2% |
| Other values (10) | 2928 |
Uppercase Letter
| Value | Count | Frequency (%) |
| F | 352 | |
| H | 312 | |
| E | 170 | |
| S | 166 |
Space Separator
| Value | Count | Frequency (%) |
| 1652 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 16888 | |
| Common | 1652 | 8.9% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 2338 | |
| a | 1822 | |
| s | 1722 | |
| o | 1370 | 8.1% |
| c | 1036 | 6.1% |
| r | 1024 | 6.1% |
| n | 1000 | 5.9% |
| t | 966 | 5.7% |
| i | 856 | 5.1% |
| d | 826 | 4.9% |
| Other values (14) | 3928 |
Common
| Value | Count | Frequency (%) |
| 1652 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 18540 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 2338 | |
| a | 1822 | 9.8% |
| s | 1722 | 9.3% |
| 1652 | 8.9% | |
| o | 1370 | 7.4% |
| c | 1036 | 5.6% |
| r | 1024 | 5.5% |
| n | 1000 | 5.4% |
| t | 966 | 5.2% |
| i | 856 | 4.6% |
| Other values (15) | 4754 |
| Distinct | 943 |
|---|---|
| Distinct (%) | 94.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 55.67213 |
| Minimum | 10.08 |
|---|---|
| Maximum | 99.96 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 10.08 |
|---|---|
| 5-th percentile | 15.279 |
| Q1 | 32.875 |
| median | 55.23 |
| Q3 | 77.935 |
| 95-th percentile | 97.222 |
| Maximum | 99.96 |
| Range | 89.88 |
| Interquartile range (IQR) | 45.06 |
Descriptive statistics
| Standard deviation | 26.49462835 |
|---|---|
| Coefficient of variation (CV) | 0.4759047004 |
| Kurtosis | -1.218591428 |
| Mean | 55.67213 |
| Median Absolute Deviation (MAD) | 22.505 |
| Skewness | 0.007077447853 |
| Sum | 55672.13 |
| Variance | 701.9653313 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 83.77 | 3 | 0.3% |
| 39.62 | 2 | 0.2% |
| 24.74 | 2 | 0.2% |
| 19.15 | 2 | 0.2% |
| 73.47 | 2 | 0.2% |
| 95.54 | 2 | 0.2% |
| 78.31 | 2 | 0.2% |
| 26.26 | 2 | 0.2% |
| 89.48 | 2 | 0.2% |
| 72.88 | 2 | 0.2% |
| Other values (933) | 979 |
| Value | Count | Frequency (%) |
| 10.08 | 1 | |
| 10.13 | 1 | |
| 10.16 | 1 | |
| 10.17 | 1 | |
| 10.18 | 1 | |
| 10.53 | 1 | |
| 10.56 | 1 | |
| 10.59 | 1 | |
| 10.69 | 1 | |
| 10.75 | 1 |
| Value | Count | Frequency (%) |
| 99.96 | 2 | |
| 99.92 | 1 | |
| 99.89 | 1 | |
| 99.83 | 1 | |
| 99.82 | 2 | |
| 99.79 | 1 | |
| 99.78 | 1 | |
| 99.73 | 1 | |
| 99.71 | 1 | |
| 99.7 | 1 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.51 |
| Minimum | 1 |
|---|---|
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 8 |
| 95-th percentile | 10 |
| Maximum | 10 |
| Range | 9 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.923430595 |
|---|---|
| Coefficient of variation (CV) | 0.5305681661 |
| Kurtosis | -1.215547226 |
| Mean | 5.51 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.01294104802 |
| Sum | 5510 |
| Variance | 8.546446446 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) |
| 10 | 119 | |
| 1 | 112 | |
| 4 | 109 | |
| 7 | 102 | |
| 5 | 102 | |
| 6 | 98 | |
| 9 | 92 | |
| 2 | 91 | |
| 3 | 90 | |
| 8 | 85 |
| Value | Count | Frequency (%) |
| 1 | 112 | |
| 2 | 91 | |
| 3 | 90 | |
| 4 | 109 | |
| 5 | 102 | |
| 6 | 98 | |
| 7 | 102 | |
| 8 | 85 | |
| 9 | 92 | |
| 10 | 119 |
| Value | Count | Frequency (%) |
| 10 | 119 | |
| 9 | 92 | |
| 8 | 85 | |
| 7 | 102 | |
| 6 | 98 | |
| 5 | 102 | |
| 4 | 109 | |
| 3 | 90 | |
| 2 | 91 | |
| 1 | 112 |
| Distinct | 990 |
|---|---|
| Distinct (%) | 99.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.379369 |
| Minimum | 0.5085 |
|---|---|
| Maximum | 49.65 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0.5085 |
|---|---|
| 5-th percentile | 1.955725 |
| Q1 | 5.924875 |
| median | 12.088 |
| Q3 | 22.44525 |
| 95-th percentile | 39.1665 |
| Maximum | 49.65 |
| Range | 49.1415 |
| Interquartile range (IQR) | 16.520375 |
Descriptive statistics
| Standard deviation | 11.70882548 |
|---|---|
| Coefficient of variation (CV) | 0.7613332823 |
| Kurtosis | -0.0818847579 |
| Mean | 15.379369 |
| Median Absolute Deviation (MAD) | 7.50875 |
| Skewness | 0.892569805 |
| Sum | 15379.369 |
| Variance | 137.0965941 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10.326 | 2 | 0.2% |
| 4.464 | 2 | 0.2% |
| 4.154 | 2 | 0.2% |
| 9.0045 | 2 | 0.2% |
| 22.428 | 2 | 0.2% |
| 39.48 | 2 | 0.2% |
| 10.3635 | 2 | 0.2% |
| 8.377 | 2 | 0.2% |
| 13.188 | 2 | 0.2% |
| 12.57 | 2 | 0.2% |
| Other values (980) | 980 |
| Value | Count | Frequency (%) |
| 0.5085 | 1 | |
| 0.6045 | 1 | |
| 0.627 | 1 | |
| 0.639 | 1 | |
| 0.699 | 1 | |
| 0.767 | 1 | |
| 0.7715 | 1 | |
| 0.775 | 1 | |
| 0.814 | 1 | |
| 0.8875 | 1 |
| Value | Count | Frequency (%) |
| 49.65 | 1 | |
| 49.49 | 1 | |
| 49.26 | 1 | |
| 48.75 | 1 | |
| 48.69 | 1 | |
| 48.685 | 1 | |
| 48.605 | 1 | |
| 47.79 | 1 | |
| 47.72 | 1 | |
| 45.325 | 1 |
| Distinct | 990 |
|---|---|
| Distinct (%) | 99.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 322.96682 |
| Minimum | 10.68 |
|---|---|
| Maximum | 1042.65 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 10.68 |
|---|---|
| 5-th percentile | 41.074 |
| Q1 | 124.425 |
| median | 253.85 |
| Q3 | 471.35 |
| 95-th percentile | 822.501 |
| Maximum | 1042.65 |
| Range | 1031.97 |
| Interquartile range (IQR) | 346.925 |
Descriptive statistics
| Standard deviation | 245.8853975 |
|---|---|
| Coefficient of variation (CV) | 0.7613333083 |
| Kurtosis | -0.08188596017 |
| Mean | 322.96682 |
| Median Absolute Deviation (MAD) | 157.685 |
| Skewness | 0.8925696949 |
| Sum | 322966.82 |
| Variance | 60459.62872 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 216.85 | 2 | 0.2% |
| 93.74 | 2 | 0.2% |
| 87.23 | 2 | 0.2% |
| 189.09 | 2 | 0.2% |
| 470.99 | 2 | 0.2% |
| 829.08 | 2 | 0.2% |
| 217.63 | 2 | 0.2% |
| 175.92 | 2 | 0.2% |
| 276.95 | 2 | 0.2% |
| 263.97 | 2 | 0.2% |
| Other values (980) | 980 |
| Value | Count | Frequency (%) |
| 10.68 | 1 | |
| 12.69 | 1 | |
| 13.17 | 1 | |
| 13.42 | 1 | |
| 14.68 | 1 | |
| 16.11 | 1 | |
| 16.2 | 1 | |
| 16.27 | 1 | |
| 17.09 | 1 | |
| 18.64 | 1 |
| Value | Count | Frequency (%) |
| 1042.65 | 1 | |
| 1039.29 | 1 | |
| 1034.46 | 1 | |
| 1023.75 | 1 | |
| 1022.49 | 1 | |
| 1022.38 | 1 | |
| 1020.7 | 1 | |
| 1003.59 | 1 | |
| 1002.12 | 1 | |
| 951.82 | 1 |
Data
Date
| Distinct | 89 |
|---|---|
| Distinct (%) | 8.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| Minimum | 2019-01-01 00:00:00 |
|---|---|
| Maximum | 2019-03-30 00:00:00 |
Histogram with fixed size bins (bins=50)
| Distinct | 506 |
|---|---|
| Distinct (%) | 50.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 19:48 | 7 |
|---|---|
| 14:42 | 7 |
| 17:38 | 6 |
| 17:16 | 5 |
| 11:40 | 5 |
| Other values (501) |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
Characters and Unicode
| Total characters | 5000 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 210 ? |
|---|---|
| Unique (%) | 21.0% |
Sample
| 1st row | 13:08 |
|---|---|
| 2nd row | 10:29 |
| 3rd row | 13:23 |
| 4th row | 20:33 |
| 5th row | 10:37 |
Common Values
| Value | Count | Frequency (%) |
| 19:48 | 7 | 0.7% |
| 14:42 | 7 | 0.7% |
| 17:38 | 6 | 0.6% |
| 17:16 | 5 | 0.5% |
| 11:40 | 5 | 0.5% |
| 13:48 | 5 | 0.5% |
| 19:39 | 5 | 0.5% |
| 19:20 | 5 | 0.5% |
| 17:36 | 5 | 0.5% |
| 13:58 | 5 | 0.5% |
| Other values (496) | 945 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 19:48 | 7 | 0.7% |
| 14:42 | 7 | 0.7% |
| 17:38 | 6 | 0.6% |
| 17:36 | 5 | 0.5% |
| 19:44 | 5 | 0.5% |
| 11:51 | 5 | 0.5% |
| 10:11 | 5 | 0.5% |
| 13:58 | 5 | 0.5% |
| 19:30 | 5 | 0.5% |
| 19:20 | 5 | 0.5% |
| Other values (496) | 945 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 1250 | |
| : | 1000 | |
| 2 | 441 | 8.8% |
| 0 | 437 | 8.7% |
| 3 | 378 | 7.6% |
| 4 | 376 | 7.5% |
| 5 | 354 | 7.1% |
| 8 | 216 | 4.3% |
| 9 | 200 | 4.0% |
| 6 | 184 | 3.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 4000 | |
| Other Punctuation | 1000 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 1250 | |
| 2 | 441 | 11.0% |
| 0 | 437 | 10.9% |
| 3 | 378 | 9.4% |
| 4 | 376 | 9.4% |
| 5 | 354 | 8.8% |
| 8 | 216 | 5.4% |
| 9 | 200 | 5.0% |
| 6 | 184 | 4.6% |
| 7 | 164 | 4.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 1000 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 5000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 1250 | |
| : | 1000 | |
| 2 | 441 | 8.8% |
| 0 | 437 | 8.7% |
| 3 | 378 | 7.6% |
| 4 | 376 | 7.5% |
| 5 | 354 | 7.1% |
| 8 | 216 | 4.3% |
| 9 | 200 | 4.0% |
| 6 | 184 | 3.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 5000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 1250 | |
| : | 1000 | |
| 2 | 441 | 8.8% |
| 0 | 437 | 8.7% |
| 3 | 378 | 7.6% |
| 4 | 376 | 7.5% |
| 5 | 354 | 7.1% |
| 8 | 216 | 4.3% |
| 9 | 200 | 4.0% |
| 6 | 184 | 3.7% |
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| Ewallet | |
|---|---|
| Cash | |
| Credit card |
Length
| Max length | 11 |
|---|---|
| Median length | 7 |
| Mean length | 7.212 |
| Min length | 4 |
Characters and Unicode
| Total characters | 7212 |
|---|---|
| Distinct characters | 14 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Ewallet |
|---|---|
| 2nd row | Cash |
| 3rd row | Credit card |
| 4th row | Ewallet |
| 5th row | Ewallet |
Common Values
| Value | Count | Frequency (%) |
| Ewallet | 345 | |
| Cash | 344 | |
| Credit card | 311 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| ewallet | 345 | |
| cash | 344 | |
| credit | 311 | |
| card | 311 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 1000 | |
| l | 690 | |
| e | 656 | |
| t | 656 | |
| C | 655 | |
| r | 622 | |
| d | 622 | |
| E | 345 | 4.8% |
| w | 345 | 4.8% |
| s | 344 | 4.8% |
| Other values (4) | 1277 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 5901 | |
| Uppercase Letter | 1000 | 13.9% |
| Space Separator | 311 | 4.3% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1000 | |
| l | 690 | |
| e | 656 | |
| t | 656 | |
| r | 622 | |
| d | 622 | |
| w | 345 | 5.8% |
| s | 344 | 5.8% |
| h | 344 | 5.8% |
| i | 311 | 5.3% |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 655 | |
| E | 345 |
Space Separator
| Value | Count | Frequency (%) |
| 311 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6901 | |
| Common | 311 | 4.3% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 1000 | |
| l | 690 | |
| e | 656 | |
| t | 656 | |
| C | 655 | |
| r | 622 | |
| d | 622 | |
| E | 345 | 5.0% |
| w | 345 | 5.0% |
| s | 344 | 5.0% |
| Other values (3) | 966 |
Common
| Value | Count | Frequency (%) |
| 311 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7212 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 1000 | |
| l | 690 | |
| e | 656 | |
| t | 656 | |
| C | 655 | |
| r | 622 | |
| d | 622 | |
| E | 345 | 4.8% |
| w | 345 | 4.8% |
| s | 344 | 4.8% |
| Other values (4) | 1277 |
| Distinct | 990 |
|---|---|
| Distinct (%) | 99.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 307.58738 |
| Minimum | 10.17 |
|---|---|
| Maximum | 993 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 10.17 |
|---|---|
| 5-th percentile | 39.1145 |
| Q1 | 118.4975 |
| median | 241.76 |
| Q3 | 448.905 |
| 95-th percentile | 783.33 |
| Maximum | 993 |
| Range | 982.83 |
| Interquartile range (IQR) | 330.4075 |
Descriptive statistics
| Standard deviation | 234.1765096 |
|---|---|
| Coefficient of variation (CV) | 0.7613332823 |
| Kurtosis | -0.0818847579 |
| Mean | 307.58738 |
| Median Absolute Deviation (MAD) | 150.175 |
| Skewness | 0.892569805 |
| Sum | 307587.38 |
| Variance | 54838.63766 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 206.52 | 2 | 0.2% |
| 89.28 | 2 | 0.2% |
| 83.08 | 2 | 0.2% |
| 180.09 | 2 | 0.2% |
| 448.56 | 2 | 0.2% |
| 789.6 | 2 | 0.2% |
| 207.27 | 2 | 0.2% |
| 167.54 | 2 | 0.2% |
| 263.76 | 2 | 0.2% |
| 251.4 | 2 | 0.2% |
| Other values (980) | 980 |
| Value | Count | Frequency (%) |
| 10.17 | 1 | |
| 12.09 | 1 | |
| 12.54 | 1 | |
| 12.78 | 1 | |
| 13.98 | 1 | |
| 15.34 | 1 | |
| 15.43 | 1 | |
| 15.5 | 1 | |
| 16.28 | 1 | |
| 17.75 | 1 |
| Value | Count | Frequency (%) |
| 993 | 1 | |
| 989.8 | 1 | |
| 985.2 | 1 | |
| 975 | 1 | |
| 973.8 | 1 | |
| 973.7 | 1 | |
| 972.1 | 1 | |
| 955.8 | 1 | |
| 954.4 | 1 | |
| 906.5 | 1 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.9 KiB |
| 4.761904762 |
|---|
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 11 |
| Min length | 11 |
Characters and Unicode
| Total characters | 11000 |
|---|---|
| Distinct characters | 8 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 4.761904762 |
|---|---|
| 2nd row | 4.761904762 |
| 3rd row | 4.761904762 |
| 4th row | 4.761904762 |
| 5th row | 4.761904762 |
Common Values
| Value | Count | Frequency (%) |
| 4.761904762 | 1000 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 4.761904762 | 1000 |
Most occurring characters
| Value | Count | Frequency (%) |
| 4 | 2000 | |
| 7 | 2000 | |
| 6 | 2000 | |
| . | 1000 | |
| 1 | 1000 | |
| 9 | 1000 | |
| 0 | 1000 | |
| 2 | 1000 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 10000 | |
| Other Punctuation | 1000 | 9.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 4 | 2000 | |
| 7 | 2000 | |
| 6 | 2000 | |
| 1 | 1000 | |
| 9 | 1000 | |
| 0 | 1000 | |
| 2 | 1000 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1000 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 11000 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 4 | 2000 | |
| 7 | 2000 | |
| 6 | 2000 | |
| . | 1000 | |
| 1 | 1000 | |
| 9 | 1000 | |
| 0 | 1000 | |
| 2 | 1000 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11000 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 4 | 2000 | |
| 7 | 2000 | |
| 6 | 2000 | |
| . | 1000 | |
| 1 | 1000 | |
| 9 | 1000 | |
| 0 | 1000 | |
| 2 | 1000 |
| Distinct | 990 |
|---|---|
| Distinct (%) | 99.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.379369 |
| Minimum | 0.5085 |
|---|---|
| Maximum | 49.65 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0.5085 |
|---|---|
| 5-th percentile | 1.955725 |
| Q1 | 5.924875 |
| median | 12.088 |
| Q3 | 22.44525 |
| 95-th percentile | 39.1665 |
| Maximum | 49.65 |
| Range | 49.1415 |
| Interquartile range (IQR) | 16.520375 |
Descriptive statistics
| Standard deviation | 11.70882548 |
|---|---|
| Coefficient of variation (CV) | 0.7613332823 |
| Kurtosis | -0.0818847579 |
| Mean | 15.379369 |
| Median Absolute Deviation (MAD) | 7.50875 |
| Skewness | 0.892569805 |
| Sum | 15379.369 |
| Variance | 137.0965941 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 10.326 | 2 | 0.2% |
| 4.464 | 2 | 0.2% |
| 4.154 | 2 | 0.2% |
| 9.0045 | 2 | 0.2% |
| 22.428 | 2 | 0.2% |
| 39.48 | 2 | 0.2% |
| 10.3635 | 2 | 0.2% |
| 8.377 | 2 | 0.2% |
| 13.188 | 2 | 0.2% |
| 12.57 | 2 | 0.2% |
| Other values (980) | 980 |
| Value | Count | Frequency (%) |
| 0.5085 | 1 | |
| 0.6045 | 1 | |
| 0.627 | 1 | |
| 0.639 | 1 | |
| 0.699 | 1 | |
| 0.767 | 1 | |
| 0.7715 | 1 | |
| 0.775 | 1 | |
| 0.814 | 1 | |
| 0.8875 | 1 |
| Value | Count | Frequency (%) |
| 49.65 | 1 | |
| 49.49 | 1 | |
| 49.26 | 1 | |
| 48.75 | 1 | |
| 48.69 | 1 | |
| 48.685 | 1 | |
| 48.605 | 1 | |
| 47.79 | 1 | |
| 47.72 | 1 | |
| 45.325 | 1 |
| Distinct | 61 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.9727 |
| Minimum | 4 |
|---|---|
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 4 |
|---|---|
| 5-th percentile | 4.295 |
| Q1 | 5.5 |
| median | 7 |
| Q3 | 8.5 |
| 95-th percentile | 9.7 |
| Maximum | 10 |
| Range | 6 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.718580294 |
|---|---|
| Coefficient of variation (CV) | 0.2464727142 |
| Kurtosis | -1.151586839 |
| Mean | 6.9727 |
| Median Absolute Deviation (MAD) | 1.5 |
| Skewness | 0.009009648766 |
| Sum | 6972.7 |
| Variance | 2.953518228 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6 | 26 | 2.6% |
| 6.6 | 24 | 2.4% |
| 4.2 | 22 | 2.2% |
| 9.5 | 22 | 2.2% |
| 6.5 | 21 | 2.1% |
| 5 | 21 | 2.1% |
| 6.2 | 21 | 2.1% |
| 8 | 21 | 2.1% |
| 5.1 | 21 | 2.1% |
| 7.6 | 20 | 2.0% |
| Other values (51) | 781 |
| Value | Count | Frequency (%) |
| 4 | 11 | |
| 4.1 | 17 | |
| 4.2 | 22 | |
| 4.3 | 18 | |
| 4.4 | 17 | |
| 4.5 | 17 | |
| 4.6 | 8 | 0.8% |
| 4.7 | 12 | |
| 4.8 | 13 | |
| 4.9 | 18 |
| Value | Count | Frequency (%) |
| 10 | 5 | 0.5% |
| 9.9 | 16 | |
| 9.8 | 19 | |
| 9.7 | 14 | |
| 9.6 | 17 | |
| 9.5 | 22 | |
| 9.4 | 12 | |
| 9.3 | 16 | |
| 9.2 | 16 | |
| 9.1 | 14 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
First rows
| ID_fatura | Filial | Cidade | Tipo de cliente | Sexo | Linha de produtos | Preco unitario | Quantidade | Imposto | Total | Data | Horario | Pagamento | Custo mercadoria | Porcentagem margem bruta | Renda bruta | Nota da experiência de compra | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 750-67-8428 | A | Yangon | Member | Female | Health and beauty | 74.69 | 7 | 26.1415 | 548.97 | 2019-01-05 | 13:08 | Ewallet | 522.83 | 4.761905 | 26.1415 | 9.1 |
| 1 | 226-31-3081 | C | Naypyitaw | Normal | Female | Electronic accessories | 15.28 | 5 | 3.8200 | 80.22 | 2019-03-08 | 10:29 | Cash | 76.40 | 4.761905 | 3.8200 | 9.6 |
| 2 | 631-41-3108 | A | Yangon | Normal | Male | Home and lifestyle | 46.33 | 7 | 16.2155 | 340.53 | 2019-03-03 | 13:23 | Credit card | 324.31 | 4.761905 | 16.2155 | 7.4 |
| 3 | 123-19-1176 | A | Yangon | Member | Male | Health and beauty | 58.22 | 8 | 23.2880 | 489.05 | 2019-01-27 | 20:33 | Ewallet | 465.76 | 4.761905 | 23.2880 | 8.4 |
| 4 | 373-73-7910 | A | Yangon | Normal | Male | Sports and travel | 86.31 | 7 | 30.2085 | 634.38 | 2019-02-08 | 10:37 | Ewallet | 604.17 | 4.761905 | 30.2085 | 5.3 |
| 5 | 699-14-3026 | C | Naypyitaw | Normal | Male | Electronic accessories | 85.39 | 7 | 29.8865 | 627.62 | 2019-03-25 | 18:30 | Ewallet | 597.73 | 4.761905 | 29.8865 | 4.1 |
| 6 | 355-53-5943 | A | Yangon | Member | Female | Electronic accessories | 68.84 | 6 | 20.6520 | 433.69 | 2019-02-25 | 14:36 | Ewallet | 413.04 | 4.761905 | 20.6520 | 5.8 |
| 7 | 315-22-5665 | C | Naypyitaw | Normal | Female | Home and lifestyle | 73.56 | 10 | 36.7800 | 772.38 | 2019-02-24 | 11:38 | Ewallet | 735.60 | 4.761905 | 36.7800 | 8.0 |
| 8 | 665-32-9167 | A | Yangon | Member | Female | Health and beauty | 36.26 | 2 | 3.6260 | 76.15 | 2019-01-10 | 17:15 | Credit card | 72.52 | 4.761905 | 3.6260 | 7.2 |
| 9 | 692-92-5582 | B | Mandalay | Member | Female | Food and beverages | 54.84 | 3 | 8.2260 | 172.75 | 2019-02-20 | 13:27 | Credit card | 164.52 | 4.761905 | 8.2260 | 5.9 |
Last rows
| ID_fatura | Filial | Cidade | Tipo de cliente | Sexo | Linha de produtos | Preco unitario | Quantidade | Imposto | Total | Data | Horario | Pagamento | Custo mercadoria | Porcentagem margem bruta | Renda bruta | Nota da experiência de compra | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 990 | 886-18-2897 | A | Yangon | Normal | Female | Food and beverages | 56.56 | 5 | 14.1400 | 296.94 | 2019-03-22 | 19:06 | Credit card | 282.80 | 4.761905 | 14.1400 | 4.5 |
| 991 | 602-16-6955 | B | Mandalay | Normal | Female | Sports and travel | 76.60 | 10 | 38.3000 | 804.30 | 2019-01-24 | 18:10 | Ewallet | 766.00 | 4.761905 | 38.3000 | 6.0 |
| 992 | 745-74-0715 | A | Yangon | Normal | Male | Electronic accessories | 58.03 | 2 | 5.8030 | 121.86 | 2019-03-10 | 20:46 | Ewallet | 116.06 | 4.761905 | 5.8030 | 8.8 |
| 993 | 690-01-6631 | B | Mandalay | Normal | Male | Fashion accessories | 17.49 | 10 | 8.7450 | 183.64 | 2019-02-22 | 18:35 | Ewallet | 174.90 | 4.761905 | 8.7450 | 6.6 |
| 994 | 652-49-6720 | C | Naypyitaw | Member | Female | Electronic accessories | 60.95 | 1 | 3.0475 | 64.00 | 2019-02-18 | 11:40 | Ewallet | 60.95 | 4.761905 | 3.0475 | 5.9 |
| 995 | 233-67-5758 | C | Naypyitaw | Normal | Male | Health and beauty | 40.35 | 1 | 2.0175 | 42.37 | 2019-01-29 | 13:46 | Ewallet | 40.35 | 4.761905 | 2.0175 | 6.2 |
| 996 | 303-96-2227 | B | Mandalay | Normal | Female | Home and lifestyle | 97.38 | 10 | 48.6900 | 1022.49 | 2019-03-02 | 17:16 | Ewallet | 973.80 | 4.761905 | 48.6900 | 4.4 |
| 997 | 727-02-1313 | A | Yangon | Member | Male | Food and beverages | 31.84 | 1 | 1.5920 | 33.43 | 2019-02-09 | 13:22 | Cash | 31.84 | 4.761905 | 1.5920 | 7.7 |
| 998 | 347-56-2442 | A | Yangon | Normal | Male | Home and lifestyle | 65.82 | 1 | 3.2910 | 69.11 | 2019-02-22 | 15:33 | Cash | 65.82 | 4.761905 | 3.2910 | 4.1 |
| 999 | 849-09-3807 | A | Yangon | Member | Female | Fashion accessories | 88.34 | 7 | 30.9190 | 649.30 | 2019-02-18 | 13:28 | Cash | 618.38 | 4.761905 | 30.9190 | 6.6 |